152 research outputs found

    The Role of Cores in Recommender Benchmarking for Social Bookmarking Systems

    Get PDF
    Social bookmarking systems have established themselves as an important part in today’s Web. In such systems, tag recommender systems support users during the posting of a resource by suggesting suitable tags. Tag recommender algorithms have often been evaluated in offline benchmarking experiments. Yet, the particular setup of such experiments has rarely been analyzed. In particular, since the recommendation quality usually suffers from difficulties such as the sparsity of the data or the cold-start problem for new resources or users, datasets have often been pruned to so-called cores (specific subsets of the original datasets), without much consideration of the implications on the benchmarking results. In this article, we generalize the notion of a core by introducing the new notion of a set-core, which is independent of any graph structure, to overcome a structural drawback in the previous constructions of cores on tagging data. We show that problems caused by some types of cores can be eliminated using set-cores. Further, we present a thorough analysis of tag recommender benchmarking setups using cores. To that end, we conduct a large-scale experiment on four real-world datasets, in which we analyze the influence of different cores on the evaluation of recommendation algorithms. We can show that the results of the comparison of different recommendation approaches depends on the selection of core type and level. For the benchmarking of tag recommender algorithms, our results suggest that the evaluation must be set up more carefully and should not be based on one arbitrarily chosen core type and level

    Posted, Visited, Exported: Altmetrics in the Social Tagging System BibSonomy

    Get PDF
    In social tagging systems, like Mendeley, CiteULike, and BibSonomy, users can post, tag, visit, or export scholarly publications. In this paper, we compare citations with metrics derived from users’ activities (altmetrics) in the popular social bookmarking system BibSonomy. Our analysis, using a corpus of more than 250,000 publications published before 2010, reveals that overall, citations and altmetrics in BibSonomy are mildly correlated. Furthermore, grouping publications by user-generated tags results in topic-homogeneous subsets that exhibit higher correlations with citations than the full corpus. We find that posts, exports, and visits of publications are correlated with citations and even bear predictive power over future impact. Machine learning classifiers predict whether the number of citations that a publication receives in a year exceeds the median number of citations in that year, based on the usage counts of the preceding year. In that setup, a Random Forest predictor outperforms the baseline on average by seven percentage points

    Discovering Implicational Knowledge in Wikidata

    Full text link
    Knowledge graphs have recently become the state-of-the-art tool for representing the diverse and complex knowledge of the world. Examples include the proprietary knowledge graphs of companies such as Google, Facebook, IBM, or Microsoft, but also freely available ones such as YAGO, DBpedia, and Wikidata. A distinguishing feature of Wikidata is that the knowledge is collaboratively edited and curated. While this greatly enhances the scope of Wikidata, it also makes it impossible for a single individual to grasp complex connections between properties or understand the global impact of edits in the graph. We apply Formal Concept Analysis to efficiently identify comprehensible implications that are implicitly present in the data. Although the complex structure of data modelling in Wikidata is not amenable to a direct approach, we overcome this limitation by extracting contextual representations of parts of Wikidata in a systematic fashion. We demonstrate the practical feasibility of our approach through several experiments and show that the results may lead to the discovery of interesting implicational knowledge. Besides providing a method for obtaining large real-world data sets for FCA, we sketch potential applications in offering semantic assistance for editing and curating Wikidata

    On the Usability of Probably Approximately Correct Implication Bases

    Full text link
    We revisit the notion of probably approximately correct implication bases from the literature and present a first formulation in the language of formal concept analysis, with the goal to investigate whether such bases represent a suitable substitute for exact implication bases in practical use-cases. To this end, we quantitatively examine the behavior of probably approximately correct implication bases on artificial and real-world data sets and compare their precision and recall with respect to their corresponding exact implication bases. Using a small example, we also provide qualitative insight that implications from probably approximately correct bases can still represent meaningful knowledge from a given data set.Comment: 17 pages, 8 figures; typos added, corrected x-label on graph

    Propagation of Policies in Rich Data Flows

    Get PDF
    Governing the life cycle of data on the web is a challenging issue for organisations and users. Data is distributed under certain policies that determine what actions are allowed and in which circumstances. Assessing what policies propagate to the output of a process is one crucial problem. Having a description of policies and data flow steps implies a huge number of propagation rules to be specified and computed (number of policies times number of actions). In this paper we provide a method to obtain an abstraction that allows to reduce the number of rules significantly. We use the Datanode ontology, a hierarchical organisation of the possible relations between data objects, to compact the knowledge base to a set of more abstract rules. After giving a definition of Policy Propagation Rule, we show (1) a methodology to abstract policy propagation rules based on an ontology, (2) how effective this methodology is when using the Datanode ontology, (3) how this ontology can evolve in order to better represent the behaviour of policy propagation rules

    Query-Based Multicontexts for Knowledge Base Browsing: An Evaluation

    Get PDF
    In [7], we introduced the query-based multicontext theory, which allows to define a virtual space of views on ontological data. Each view is then materialised as a formal context. While this formal context can be visualised in a usual formal concept analysis framework such as Conexp or ToscanaJ, [7] also briefly described how the approach allowed the creation of a novel navigation framework for knowledge bases. The principle of this navigation is based on supporting the user in defining pertinent views. The purpose of this article is to discuss the benefits of the browsing interface. This discussion is performed, on the one hand, by comparing the approach to other Formal Concept Analysis based frameworks. On the other hand, it exposes the preliminary evaluation of the visualisation of formal contexts by comparing the display of a lattice to two other approaches based on trees and graphs

    Folksonomies and clustering in the collaborative system CiteULike

    Full text link
    We analyze CiteULike, an online collaborative tagging system where users bookmark and annotate scientific papers. Such a system can be naturally represented as a tripartite graph whose nodes represent papers, users and tags connected by individual tag assignments. The semantics of tags is studied here, in order to uncover the hidden relationships between tags. We find that the clustering coefficient reflects the semantical patterns among tags, providing useful ideas for the designing of more efficient methods of data classification and spam detection.Comment: 9 pages, 5 figures, iop style; corrected typo

    Improving case retrieval by enrichment of the domain ontology

    Get PDF
    International audienceOne way of processing case retrieval in a case-based reasoning CBR system is using an ontology in order to generalise the target problem in a progressive way, then adapting the source cases corresponding to the generalised target problem. This paper shows how enriching this ontology improves the retrieval and final results of the \cbr system. An existing ontology is enriched by automatically adding new classes that will refine the initial organisation of classes. The new classes come from a data mining process using formal concept analysis. Additional data about ontology classes are collected explicitly for this data mining process. The formal concepts generated by the process are introduced into the ontology as new classes. The new ontology, which is better structured, enables a more fine-grained generalisation of the target problem than the initial ontology. These principles are tested out within Taaable (http://taaable.fr), a CBR system that searches cooking recipes satisfying constraints given by a user, or adapts recipes by substituting certain ingredients for others. The ingredient ontology of Taaable has been enriched thanks to ingredient properties extracted from recipe texts

    Providing Alternative Declarative Descriptions for Entity Sets Using Parallel Concept Lattices

    Get PDF
    We propose an approach for modifying a declarative description of a set of entities (e.g., a SPARQL query) for the purpose of finding alternative declarative descriptions for the entities. Such a shift in representation can help to get new insights into the data, to discover related attributes, or to find a more concise description of the entities of interest. Allowing the alternative descriptions furthermore to be close approximations of the original entity set leads to more flexibility in finding such insights. Our approach is based on the construction of parallel formal concept lattices over different sets of attributes for the same entities. Between the formal concepts in the parallel lattices, we define mappings which constitute approximations of the extent of the concepts. In this paper, we formalise the idea of two types of mappings between parallel concept lattices, provide an implementation of these mappings and evaluate their ability to find alternative descriptions in a scenario of several real-world RDF data sets. In this scenario we use descriptions for entities based on RDF classes and seek for alternative representations based on properties associated with the entities
    • …
    corecore